Skip to content

fix(turso): replace client.transaction() with UPDATE...RETURNING to prevent native connection leak in queue poller#41

Open
CalmProton wants to merge 1 commit intomizzle-dev:mainfrom
CalmProton:fix/turso-queue-connection-leak
Open

fix(turso): replace client.transaction() with UPDATE...RETURNING to prevent native connection leak in queue poller#41
CalmProton wants to merge 1 commit intomizzle-dev:mainfrom
CalmProton:fix/turso-queue-connection-leak

Conversation

@CalmProton
Copy link
Copy Markdown

Problem

When world.start() is called, @workflow-worlds/turso starts a queue poller that fires every 100 ms. On each tick, pollAndProcess() calls client.transaction('write') to atomically claim a message.

This causes a native SQLite connection leak that crashes the host process after 5–10 minutes of idle operation.

Root cause: @libsql/client's transaction() orphans the connection

Inside @libsql/client@0.17.2 (sqlite3.js:154-158):

async transaction(mode = "write") {
    const db = this.#getDb();
    executeStmt(db, transactionModeToBegin(mode), this.#intMode);
    this.#db = null; // A new connection will be lazily created on next use
    return new Sqlite3Transaction(db, this.#intMode);
}

After transaction() returns, this.#db is null. The Database object is handed to Sqlite3Transaction. Once the transaction commits or rolls back and tx goes out of scope, db is only reachable by GC. Meanwhile, the next #getDb() call (on the next poll tick) opens a brand new native SQLite connection.

Result: ~10 orphaned Database connections per second at the default 100 ms poll interval. These accumulate faster than GC can reclaim them, exhausting OS file-handle or native-heap limits. The process crashes — typically with exit code 5 (V8 fatal error / resource exhaustion) — after 5–10 minutes.

The leak occurs even when the queue is empty (zero pending messages), because the poll loop still opens a write transaction on every tick to check for work.

Note: client.batch() does not have this behaviour — it uses #getDb() and never nulls #db.

How to observe it

Set WORKFLOW_DEBUG=true before world.start(). You will see a flood of [workflow:debug] Poll error: ... lines (one every 100 ms) if the schema has not been applied, or a silent connection drain if it has.

Fix

Replace the three-step transaction → SELECT → UPDATE → commit/rollback with a single atomic UPDATE … RETURNING via client.execute().

UPDATE queue_messages
SET status = 'processing'
WHERE message_id = (
  SELECT message_id FROM queue_messages
  WHERE status = 'pending' AND (not_before IS NULL OR not_before <= ?)
  ORDER BY created_at ASC
  LIMIT 1
)
RETURNING message_id, queue_name, payload, attempt

client.execute() calls #getDb() without setting #db = null, so the single cached connection is reused on every poll tick — no orphaned connection, no leak.

Atomicity is preserved: SQLite wraps every top-level DML statement in an implicit transaction, so the subquery SELECT and the outer UPDATE are indivisible. Two concurrent pollers cannot claim the same message.

Testing

TypeScript type-check passes (pnpm --filter @workflow-worlds/turso typecheck). Existing test suite applies.

Impact

  • Eliminates process crashes (exit code 5) that occur after 5–10 minutes of idle queue polling
  • Reduces per-poll overhead (one execute() instead of transaction() + two execute() calls + commit())
  • No behaviour change for callers

…revent connection leak in queue poller

@libsql/client's transaction() method sets this.#db = null after BEGIN
so that concurrent execute() calls use a separate connection. After
commit/rollback, the detached Database object is abandoned to GC. At the
default 100ms poll interval this orphans ~10 native SQLite connections
per second — more than GC can reclaim — exhausting OS file-handle or
native-heap limits and crashing the host process (exit code 5) after
5-10 minutes of idle operation.

Replacing the three-step SELECT + UPDATE + COMMIT with a single atomic
UPDATE...RETURNING via client.execute() preserves atomicity through
SQLite's implicit per-statement transaction while reusing the single
cached connection (execute() calls #getDb() without nulling #db).

Fixes: https://github.com/mizzle-dev/workflow-worlds/issues/TBD
@CalmProton CalmProton force-pushed the fix/turso-queue-connection-leak branch from a3b0c3d to 987bb4f Compare April 20, 2026 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant